Automatic Detection Of Mitosis Using Handcrafted And Convolutional Neural Network Features

ABSTRACT

Methods, apparatus, and other embodiments associated with detecting mitosis in breast cancer pathology images by combining handcrafted (HC) and convolutional neural network (CNN) features in a cascaded architecture are described. One example apparatus includes a set of logics that acquires an image of a region of tissue, partitions the image into candidate patches, generates a first probability that the patch is mitotic using an HC feature set and a second probability that the patch is mitotic using a CNN-learned feature set, and classifies the patch based on the first probability and the second probability. If the first and second probabilities do not agree, the apparatus trains a cascaded classifier on the CNN-learned feature set and the HC feature set, generates a third probability that the patch is mitotic, and classifies the patch based on a weighted average of the first probability, the second probability, and the third probability.

CROSS REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of U.S. Provisional Application61/933,409 filed Jan. 30, 2014.

FEDERAL FUNDING NOTICE

The invention was made with government support under Federal GrantNumbers R01CA1364435-01, R01CA140772-01, NIH 1R21CA179327-01A1 andR21CA167811-01 awarded by the National Cancer Institute of the NationalInstitutes of Health, award number R01DK098503-02 provided by theNational Institute of Diabetes and Digestive and Kidney Diseases, theDOD Prostate Cancer Synergistic Idea Development Award PC120857, and DODCDMRP Lung Cancer Research Idea Development Award New InvestigatorLC130463. The government has certain rights in the invention.

BACKGROUND

Breast cancer (BCa) grading plays an important role in predictingdisease aggressiveness and patient outcome. Bloom Richardson grading isthe most commonly used grading system for histopathologic diagnosis ofinvasive BCa. A key component of Bloom Richardson grading is mitoticcount. Mitotic count, which refers to the number of dividing cells(e.g., mitoses) visible in a given area of hematoxylin and eosin (H&E)stained images, is an effective predictor of disease aggressiveness.Clinically, mitotic count is the number of mitotic nuclei identifiedvisually in a fixed number of high power fields (HPF). Conventionally,mitotic nuclei are identified manually by a pathologist. Manualidentification of mitotic nuclei suffers from poor inter-interpreteragreement due to the variable texture and morphology between mitoses.Manual identification is also a resource intensive and time consumingprocess that involves a trained pathologist manually inspecting andcounting cells viewed in an HPF under a microscope. Manualidentification is not optimal when trying to bring treatments to bear ona patient as quickly as possible in a clinically relevant timeframe.

Computerized detection of mitotic nuclei attempts to increase the speed,accuracy, and consistency of mitotic identification. However, thedetection of mitotic nuclei in an H&E stained slide is a challengingtask for an automated system. During mitosis, the cell nucleus undergoesvarious morphological transformations that lead to highly variable sizesand shapes across mitotic nuclei within the same image. Automateddetection of mitotic nuclei is further complicated by rare eventdetection. Rare event detection complicates classification when oneclass (e.g., mitotic nuclei) is substantially less prevalent than theother class (e.g., non-mitotic nuclei).

Conventional approaches to computerized mitotic detection that employmanual annotation of candidate regions by an expert pathologist offeronly limited improvements over manual detection, and still suffer fromthe problem of inter-interpreter disagreement. Some conventionalapproaches to computerized mitotic detection that try to minimizereliance on a human pathologist may employ machine learning techniques.For example, some conventional approaches to computerized mitoticdetection feature machine learning systems and methods. These systemsand methods employ convolutional neural networks (CNN) to identifyfeatures and assist in mitotic detection. However, conventional CNNmethods are computationally demanding. For example, one conventionalmethod for mitotic detection employs an eleven-layered CNN. Since eachlayer is comprised of hundreds of units, this conventional method takesseveral days to analyze an image. Other conventional methods that employCNN may take several weeks to train and test a classifier. Several weeksmay be a sub-optimal time frame when administering timely treatment to apatient suffering from an aggressive form of cancer.

Conventional methods of automatic mitotic detection may employhandcrafted (HC) features. HC features identified by conventionaltechniques include various morphological, statistical, and texturalfeatures that attempt to model the appearance of mitosis in digitizedimages. However, while HC-feature-based classifiers may be faster thanCNN-based classifiers, conventional HC feature-based classifiers are notas accurate as CNN-based classifiers, and fail to identify some featuresthat CNN-based classifiers may detect. Conventional HC-based classifiersare highly dependent on the evaluation dataset used to train theHC-based classifier. Furthermore, HC-based classifiers lack a principledapproach for combining disparate features. Thus, conventional CNN-basedclassification systems and methods of automatic mitotic detection arecomputationally intensive and may operate in time frames that are notoptimal for clinical relevance when diagnosing patients (e.g., days orweeks instead of hours or minutes). Conventional HC-based classifiers,while faster than CNN-based classifiers, are not as accurate asCNN-based classifiers, suffer from a strong dependence on the trainingdataset, and are not optimally suited for combining disparate features.

BRIEF DESCRIPTION OF THE DRAWINGS

The accompanying drawings, which are incorporated in and constitute apart of the specification, illustrate various example apparatus,methods, and other example embodiments of various aspects of theinvention. It will be appreciated that the illustrated elementboundaries (e.g., boxes, groups of boxes, or other shapes) in thefigures represent one example of the boundaries. One of ordinary skillin the art will appreciate that in some examples one element may bedesigned as multiple elements or that multiple elements may be designedas one element. In some examples, an element shown as an internalcomponent of another element may be implemented as an external componentand vice versa. Furthermore, elements may not be drawn to scale.

FIG. 1 illustrates an example method of detecting cellular mitosis in aregion of cancerous tissue.

FIG. 2 illustrates an iteration of a method associated with detectingcellular mitosis in a region of cancerous tissue.

FIG. 3 is a flow chart illustrating the operation of a CNN used inexample methods and apparatus.

FIG. 4 illustrates an example method of producing a classification of aregion of interest.

FIG. 5 illustrates an example apparatus that detects mitosis in cancerpathology images.

FIG. 6 illustrates an example computer in which example methods andapparatus described herein operate.

DETAILED DESCRIPTION

Breast cancer (BCa) grading is an important tool in predicting canceraggressiveness and patient outcome. Bloom-Richardson grading is the mostcommonly used method of BCa grading. Mitotic count is a key component ofBloom-Richardson BCa grading. Mitotic count involves quantifying thenumber of cells undergoing mitosis at a specific point in time in aspecific region. Conventionally, mitotic count is determined by a humanpathologist visually identifying mitotic nuclei in a fixed number ofhigh power fields of H&E stained histopathology images. Manualidentification suffers from poor inter-interpreter agreement due to thehighly variable texture and morphology between mitoses. Manualidentification is also laborious, expensive, and time consuming.

Conventional approaches to automated mitosis detection have been eitherHC feature based, or feature learning based. Commonly used HC featuresinclude various morphological, statistical, and textural features thatattempt to model the appearance of the domain and in particular theappearance of mitoses within digitized images. While domain inspired(e.g., handcrafted) approaches allow for explicit modeling of featuresthat human pathologists look for when identifying mitoses, HC featurebased approaches still depend heavily on the evaluation dataset whenidentifying salient features. HC feature based approaches also lack aprincipled approach for combining disparate features.

Feature learning based approaches may employ CNNs. In contrast to HCfeature based approaches, CNN based approaches are fully data driven.CNNs are multi-layer neural networks that learn a bank of convolutionalfilters at different layers. CNN-based approaches identify mitoticnuclei more accurately than HC feature based approaches, and are able tofind feature patterns that HC features fail to describe. However, CNNapproaches are computationally demanding and are sensitive toscalability of the training data. For example, conventional CNN-basedapproaches may use eleven or more layers to achieve a clinically usefullevel of accuracy. Conventional eleven layer CNNs require at leastthirty epochs for training. While a useful level of accuracy may beachieved, it is not achieved in a clinically relevant timeframe, sinceeach layer of a conventional eleven-layer model includes hundreds ofunits and requires several weeks for training and testing.

One conventional method of automated mitosis detection includes stackinga CNN-learned feature set with an HC feature set. (C. Malon and E.Cosatto, “Classification of mitotic figures with Convolutional NeuralNetworks and seeded blob features,” Journal of Pathology Informatics4(1), 9 (2013)) (NEC). The NEC approach performed classification via theCNN features and HC features together. The NEC approach of stacking theCNN features and HC features together biased the classifier towards thefeature set with the larger number of attributes, leading to asub-optimal accuracy. The NEC approach also failed to capture attributesof mitotic nuclei in relation to their local context.

Example methods and apparatus detect mitosis through classifying regionsof an image of cancerous tissue as mitotic or non-mitotic. Examplemethods and apparatus employ a cascaded approach when combining CNNfeatures and HC features in automated mitosis detection. Example methodsand apparatus perform classification with CNN features and HC featuresseparately. Example methods and apparatus use a combined CNN and HCfeature set for classification when confronted with confounding images.By employing a cascaded approach, example methods and apparatus are lessprone to biasing the classifier towards the feature set with the largernumber of attributes. In one embodiment, a three-layer CNN operates onat least an 80 pixel by 80 pixel patch size. HC features are extractedfrom a region of clusters of segmented nuclei that, in one embodiment,is less than or equal to 30 pixels by 30 pixels. Example methods andapparatus compute attributes of not only mitotic nuclei, but also of thelocal context for the mitotic nuclei. The local context around candidatemitoses facilitates correctly identifying mitosis. Conventional methodsfail to capture local context.

Example methods and apparatus segment likely candidate mitosis regionsfrom an image of cancerous tissue. Segmenting the image triages theimage by removing obviously non-mitotic regions. Removing obviouslynon-mitotic regions reduces the computing resources, electricity, andtime required to analyze an image of cancerous tissue. Time andresources are not wasted by analyzing regions that are obviouslynon-mitotic. Example methods and apparatus extract from the candidatemitosis regions a CNN-learned feature set and an HC feature set. TheCNN-learned feature set and the HC feature set are extractedindependently of each other. Example methods and apparatus constructindependently trained classifiers using the HC feature set and theCNN-learned feature set. The HC trained classifier is trained on the HCfeature set independently of the CNN trained classifier. The CNN trainedclassifier is trained on the CNN-learned feature set independently ofthe HC trained classifier.

Example methods and apparatus classify the candidate mitosis region. TheCNN trained classifier classifies the candidate mitosis regionindependently, and the HC trained classifier classifies the candidatemitosis region independently. Example methods and apparatus employ alight CNN model in which the CNN has at least three layers. Examplemethods and apparatus classify the candidate region based on aprobability the region is mitotic. The CNN trained classifier calculatesa probability that a candidate mitosis region is mitotic. If theprobability exceeds a threshold probability, the CNN trained classifierclassifies the candidate region as mitotic. If the probability is equalto or less than the threshold probability, the CNN trained classifierclassifies the candidate mitosis patch as non-mitotic. Independent ofthe CNN trained classifier, the HC trained classifier also calculates aprobability that the candidate mitosis region is mitotic. If theprobability exceeds a threshold probability, the HC trained classifierclassifies the candidate region as mitotic. If the probability is equalto or less than the threshold probability, the HC trained classifierclassifies the candidate mitosis patch as non-mitotic. If the HC trainedclassifier and the CNN trained classifier agree that the candidatemitosis region is mitotic, example methods and apparatus classify thecandidate mitosis region as mitotic. If the HC trained classifier andthe CNN trained classifier agree that the candidate mitosis region isnon-mitotic, example methods and apparatus classify the candidate regionas non-mitotic.

Example methods and apparatus classify the candidate mitosis region witha third, cascaded classifier when the HC trained classifier and the CNNtrained classifier disagree in their classifications of the candidatemitosis region. For example, if the HC trained classifier classifies thecandidate mitosis region as non-mitotic, while the CNN trainedclassifier classifies the candidate mitosis region as mitotic, the thirdclassifier will classify the region. The third classifier is trainedusing a stacked feature set that is based on the HC feature set and theCNN feature set. The third classifier may be a second-stage RandomForests classifier. The third classifier calculates a probability thatthe candidate mitosis region is mitotic. Example methods and apparatusgenerate a final classification for the candidate mitosis region basedon a weighted average of the outputs of the HC classifier, the CNNclassifier, and the cascaded classifier. Example methods and apparatusthus improve on conventional approaches to automated mitosis detectionby employing a cascaded approach to the combination of CNN and HCfeatures, by learning multiple attributes that characterize mitosisbased on the combination of CNN and HC features, and by achieving ahighly accurate level of mitosis detection while minimizing thecomputing resources and time required. Example methods and apparatusidentify mitotic nuclei with an F-measure of 0.7345 faster thanconventional methods.

Some portions of the detailed descriptions that follow are presented interms of algorithms and symbolic representations of operations on databits within a memory. These algorithmic descriptions and representationsare used by those skilled in the art to convey the substance of theirwork to others. An algorithm, here and generally, is conceived to be asequence of operations that produce a result. The operations may includephysical manipulations of physical quantities. Usually, though notnecessarily, the physical quantities take the form of electrical ormagnetic signals capable of being stored, transferred, combined,compared, and otherwise manipulated in a logic, and so on. The physicalmanipulations create a concrete, tangible, useful, real-world result.

It has proven convenient at times, principally for reasons of commonusage, to refer to these signals as bits, values, elements, symbols,characters, terms, numbers, and so on. It should be borne in mind,however, that these and similar terms are to be associated with theappropriate physical quantities and are merely convenient labels appliedto these quantities. Unless specifically stated otherwise, it isappreciated that throughout the description, terms including processing,computing, determining, and so on, refer to actions and processes of acomputer system, logic, processor, or similar electronic device thatmanipulates and transforms data represented as physical (electronic)quantities.

Example methods may be better appreciated with reference to flowdiagrams. While for purposes of simplicity of explanation, theillustrated methodologies are shown and described as a series of blocks,it is to be appreciated that the methodologies are not limited by theorder of the blocks, as some blocks can occur in different orders and/orconcurrently with other blocks from that shown and described. Moreover,less than all the illustrated blocks may be required to implement anexample methodology. Blocks may be combined or separated into multiplecomponents. Furthermore, additional and/or alternative methodologies canemploy additional, not illustrated blocks.

FIG. 1 illustrates a computerized method 100 of detecting cellularmitosis in a region of cancerous tissue using a cascaded approach thatcombines a CNN model with handcrafted features. Method 100 includes, at110, acquiring an image of a region of cancerous tissue. Accessing animage includes acquiring electronic data, reading from a computer file,receiving a computer file, reading from a computer memory, or othercomputerized activity. In one embodiment, the image is an RGB (red greenblue) color space image, and the image has dimensions of at least 2084pixels×2084 pixels. In one embodiment, the image may be acquired byscanning the image from a high power field (HPF) of a hematoxylin andeosin (H&E) stained tissue slide. The HPF represents at least a 512×512μm region of tissue. In this embodiment, the HPF is acquired using aslide scanner and a multi-spectral microscope. The slide-scanner may be,for example, an Aperio XT scanner with a resolution of 0.2456 μm perpixel. In other embodiments, the image may have different dimensions andmay be acquired from other systems, including a microscope or anelectron microscope. In still other embodiments, the HPF may represent aregion of tissue with dimensions other than 512×512 μm.

Method 100 also includes, at 120, segmenting the image into at least onecandidate mitosis patch. In one embodiment, segmenting the image into atleast one candidate mitosis patch includes generating a blue-ratio imageby converting the image from RGB color space to blue-ratio color space.Method 100 assigns a higher value to a pixel with a high blue intensityrelative to the pixel's red and green components. Assigning a highervalue to a pixel with a high blue intensity allows method 100 tohighlight nuclei regions. Method 100, at 120, also computes a Laplacianof Gaussian (LoG) response of the blue-ratio image to discriminate thenuclei region of the candidate mitosis patch from the background. Method100, also at 120, identifies a candidate nuclei. Candidate nuclei areidentified by integrating globally fixed thresholding and local dynamicthresholding. By segmenting the image into at least one candidatemitosis patch, method 100 creates the technical effect of triaging theimage by removing regions that are highly unlikely to be mitotic. Method100 may thus reduce the amount of computing resources and time requiredto analyze the image compared to conventional approaches.

Method 100 also includes, at 130, extracting a set of CNN-learnedfeatures from the candidate mitosis patch. To extract the set ofCNN-learned features, method 100 employs a convolutional neural network(CNN). Unlike conventional methods that employ multi-layer CNNs that maytake days or weeks to train and test, method 100 employs a light CNNmodel that has fewer layers than conventional methods. In oneembodiment, the architecture of the CNN model includes three layers. Thefirst layer is a convolutional-pooling layer that includes a firstconvolution layer and a first pooling layer. The first convolution layerhas at least P units, where P is a number. A first pooling layer isconnected to the first convolution layer. The first pooling layer has atleast Q units, where Q is a number. The second layer is also aconvolutional-pooling layer. The second layer includes a secondconvolutional layer and a second pooling layer. The second convolutionallayer is connected to the first pooling layer and has at least X units,where X is a number greater than P and greater than Q. The second layeralso includes a second pooling layer connected to the secondconvolutional layer. The second pooling layer has at least Y units,where Y is a number greater than P and greater than Q. The third layeris a fully-connected layer that is connected to the second poolinglayer. The fully-connected layer has Z units, where Z is a numbergreater than Y. The CNN is connected to an output layer connected to thefully-connected layer. The output layer has at least two output units.The first output unit, when activated, indicates a mitosisclassification. The second output unit, when activated, indicates anon-mitosis classification. In another embodiment, the first output unitindicates non-mitosis, and the second output unit indicates mitosis. Inother embodiments, the CNN may include different numbers of layers.

In one embodiment, P is at least 64, Q is at least 64, X is at least128, Y is at least 128, and Z is at least 256. For example, the firstconvolution layer has at least 64 units, and the first pooling layer hasat least 64 units. The second convolution layer has at least 128 units,and the second pooling layer has at least 128 units. The fully connectedlayer has at least 256 units. In other embodiments, the layers may haveother numbers of units. For example, the first convolution layer andfirst pooling layer may each have 32 units, the second convolution layerand the second pooling layer may each have 64 units, and thefully-connected layer may have 128 units.

Method 100 also includes, at 132, training a CNN classifier using theset of CNN-learned features. Training the CNN classifier using the setof CNN-learned features includes generating a rotated mitosis patch byrotating the candidate mitosis patch. Training the CNN classifier alsoincludes generating a mirrored mitosis patch by mirroring the candidatemitosis patch. Generating a rotated mitosis patch and a mirrored mitosispatch addresses the class-imbalance problem associated with conventionalapproaches, and also achieves rotational invariance. Method 100, also at132, computes a log-likelihood that the candidate mitosis patch ismitotic. To compute the log-likelihood, method 100 minimizes a lossfunction by training the CNN classifier using the rotated mitosis patch,the mirrored mitosis patch, and the candidate mitosis patch. The CNNclassifier is trained on the patches using Stochastic Gradient Descent.The loss function is defined as

${{L(x)} = {- {\log \left\lbrack \frac{^{x_{i}}}{\sum_{j}^{x_{j}}} \right\rbrack}}},$

where x_(i) corresponds to the output of the fully-connected layermultiplied by a logistic model parameter. The output of the CNN, in theform of the log-likelihood that the candidate patch is mitotic, is thusthe log-likelihood of class membership. A first class may representmitotic nuclei, and a second class may represent non-mitotic nuclei. Inother embodiments, other classes may be defined.

Method 100, at 134, generates a CNN classification score by classifyingthe candidate mitosis patch with the CNN classifier trained at step 132.Method 100, at 134, computes the probability that the candidate mitosispatch is mitotic. In one embodiment, method 100 computes the probabilitythat the candidate mitosis patch is mitotic by applying an exponentialfunction to the log-likelihood that the candidate patch is mitotic.Applying an exponential function to the log-likelihood results in aprobability that a candidate mitosis patch is mitotic expressed as areal number between 0 and 1. Method 100, also at 134, assigns theprobability that the candidate mitosis patch is mitotic to the CNNclassification score. Method 100, also at 134, determines whether theprobability that the candidate mitosis patch is mitotic is greater thana CNN threshold probability. If method 100 determines that theprobability the candidate mitosis patch is greater than the CNNthreshold, method 100 classifies the candidate mitosis patch as mitotic.If method 100 determines that the probability that the candidate mitosispatch is mitotic is less than or equal to the CNN threshold, method 100classifies the candidate mitosis patch as non-mitotic. In oneembodiment, the threshold probability is at least 0.58. In otherembodiments, other threshold embodiments may be employed.

Method 100 also includes, at 140, extracting a set of hand-crafted (HC)features from the candidate mitosis patch. In one embodiment, the set ofHC features extracted from the candidate mitosis patch includes a set ofmorphology features, a set of intensity features, and a set of texturefeatures. In other embodiments, other sets of HC features may beextracted.

In one embodiment, at 140, method 100 generates a binary mask of thecandidate mitosis patch by applying blue-ratio thresholding and localnon-maximum suppression to the candidate mitosis patch. Method 100 thenextracts the set of morphology features from the binary mask of thecandidate mitosis patch. The set of morphology features representvarious attributes of mitosis shape. In one embodiment of method 100,the set of morphology features extracted from the candidate mitosispatch includes area, eccentricity, equivalent diameter, Euler number,extent, perimeter, solidity, major axis length, minor axis length, areaoverlap ratio, average radial ratio, compactness, Hausdorff dimension,smoothness, or standard distance ratio. In another embodiment, othermorphology features may be extracted.

Method 100 also includes, at 140, extracting the set of intensityfeatures from a set of channels of the candidate mitosis patch. In oneembodiment, the set of channels includes a blue-ratio channel, a redchannel, a blue channel, a green channel, an L channel in LAB colorspace, a V channel in CIE 1976 (L*, u*, v*) (LUV) color space, or an Lchannel in LUV color space. The set of intensity features includes meanintensity, median intensity, variance, maximum/minimum ratio, range,interquartile range, kurtosis, or skewness. In another embodiment, theset of intensity features may be extracted from different channels, andthe set of intensity features may include different intensity features.

Method 100, also includes, at 140, extracting the set of texturefeatures from a set of channels of the candidate mitosis patch. In oneembodiment, the set of channels includes a blue-ratio channel, a redchannel, a blue channel, a green channel, an L channel in LAB colorspace, a V channel in CIE 1976 (L*, u*, v*) (LUV) color space, or an Lchannel in LUV color space. The set of texture features includes asubset of concurrence features and a subset of run-length features. Thesubset of concurrence features includes the mean and standard deviationof thirteen Haralick gray-level concurrence features obtained from thecandidate mitosis patch. The thirteen Haralick gray-level concurrencefeatures are obtained from the candidate mitosis patch at fourorientations. The subset of run-length features includes the mean andstandard deviation of a set of gray-level run-length matrices. The setof gray-level run-length matrices also corresponds to four orientations.In another embodiment, a different set of texture features may beextracted, or a different set of channels may be used.

In one embodiment of method 100, the dimensionality of the set of HCfeatures is reduced. In this embodiment, the dimensionality of the setof HC features extracted at step 140 is 253, including 15 morphologyfeatures, 56 intensity features corresponding to the set of 8 morphologyfeatures extracted from 7 different channels, and 182 texture features.The 182 texture features correspond to the mean and standard deviationextracted from 13 Haralick gray-level concurrence features obtained atfour dimensions, and the mean and standard deviation of gray-levelrun-length matrices obtained in four dimensions. The dimensionality ofthe set of HC features is reduced using principal component analysis(PCA) or minimum redundancy feature selection (mRMR). An optimal set offeatures is selected from the reduced-dimensionality set of HC features.In one embodiment, if PCA is employed, the optimum set of HC features isobtained by keeping 98% of the total component variations. When mRMR isused to reduce the dimensionality of the set of HC features, in oneembodiment, the top 160 features are selected. In another embodiment,other techniques to reduce the dimensionality of the HC feature set maybe used.

Method 100 also includes, at 142, training the HC classifier using theset of HC features. In one embodiment, training the HC classifierincludes reducing the number of non-mitotic nuclei by detecting aclustered set of overlapping nuclei in the candidate mitosis patch. Theoverlapping non-mitotic nuclei are replaced with the clustered center ofthe clustered set. The mitotic nuclei are then oversampled by applying aSynthetic Minority Oversampling Technique (SMOTE). Reducing the numberof non-mitotic nuclei corrects the classification bias that occurs inconventional systems caused by the relatively small number of mitoticnuclei compared to non-mitotic nuclei.

Method 100 also includes, at 144, generating an HC classification score.Generating the HC classification score includes, in one embodiment,generating an output of a Random Forest classifier. The Random Forestclassifier has at least 50 trees. The output of the Random Forestclassifier is a probability that the candidate mitosis patch is mitotic.In other embodiments, more than 50 trees or fewer than 50 trees may beemployed. Random Forest classifiers with more than 50 trees may causeover fitting, while Random Forest classifiers with fewer than 50 treesmay lead to lower classification accuracy. The number of trees may beadjusted to achieve different F-measures. Method 100, at 144, assignsthe output of the Random Forest classifier to the HC classificationscore. Upon determining that the output of the Random Forest classifieris greater than an HC threshold probability, method 100 classifies theHC classification score as mitotic. Upon determining that the output ofthe Random Forest classifier is less than or equal to the HC thresholdprobability, method 100 classifies the HC classification score asnon-mitotic. In one embodiment, the HC threshold probability is 0.58. Inanother embodiment, other HC threshold probabilities may be used. The HCthreshold probability may be adjusted to achieve different F-measures.

Method 100 also includes, at 150, producing a final classification. Thefinal classification is based, at least in part, on both the CNNclassification score and the HC classification score. Producing thefinal classification includes comparing the CNN classification scorewith the HC classification score. If the CNN classification score andthe HC classification score are both within a threshold range, method100 produces a final classification based, at least in part, on the CNNclassification score and the HC classification score. The finalclassification score indicates the probability that the candidatemitosis patch is mitotic. In one embodiment, if both the CNNclassification score and HC classification score are within a thresholdrange of (0.58, 1], method 100 produces a final classification scorethat indicates the candidate mitosis patch is mitotic. Alternately, ifboth the CNN classification score and the HC classification score arewithin a threshold range of [0, 0.58], method 100 produces a finalclassification score that indicates the candidate mitosis patch isnon-mitotic. In other embodiments, other threshold ranges may beemployed.

Producing the final classification may also include determining that theCNN classification score and the HC classification score are not bothwithin a threshold range. For example, method 100 may, at 150, determinethat the HC classification score and the CNN classification scoredisagree. The HC classification score may, for example, indicate thatthe candidate mitosis patch is non-mitotic. The HC classifier may havecomputed a probability of 0.4 that the candidate mitosis patch ismitotic, and assigned that probability to the HC classification score.The CNN classification score may indicate that the candidate mitosispatch is mitotic, the CNN classifier having assigned a probability of0.61 that the candidate mitosis patch is mitotic to the CNNclassification score. In this example upon determining that the CNNclassification score and the HC classification score disagree, method100 trains a cascaded classifier. The cascaded classifier is trainedusing a stacked set of features. The stacked set of features includesthe set of CNN-learned features and the set of HC features. The cascadedclassifier generates a cascaded classification score for the candidatemitosis patch. In one embodiment, the cascaded classifier produces aprobability that the candidate mitosis patch is mitotic. The finalclassification may, in this iteration of method 100, be based, at leastin part, on a weighted average of the CNN classification score, the HCclassification score, and the cascaded classification score. Computingthe weighted average of the CNN classification score, the HCclassification score, and the cascaded classification score results in aprobability from 0 to 1 that the candidate mitosis patch is mitotic. Thefinal classification indicates the probability that the candidatemitosis patch is mitotic.

Method 100 also includes, at 160, controlling an automated nucleidetection system to classify the candidate mitosis patch. The automatednuclei detection system may classify the candidate mitosis patch asmitotic or non-mitotic, based, at least in part, on the finalclassification. By classifying the candidate mitosis patch as mitotic,the automated nuclei detection system has detected mitosis from amongthe non-mitotic regions of the image.

Method 100 also includes, at 170, generating a mitotic count. Themitotic count is generated by calculating the total number of mitoticnuclei classified by the automated nuclei detection system at step 160for the image. The mitotic count may be used for grading the cancerrepresented in the image.

Method 100 also includes, at 180, controlling an automated cancergrading system to grade the image. The grade of a cancer isrepresentative of the aggressiveness of the cancer. A key component of acancer grade is the mitotic count. In one embodiment, method 100, at180, provides the mitotic count generated at step 170 to an automatedcancer grading system. The automated cancer grading system generates acancer grade, based, at least in part, on the mitotic count. HPFs withlower mitotic counts may generate a lower cancer grade, while HPFs withhigher mitotic counts may result in a higher cancer grade. In thisembodiment, the automated cancer grading system uses a Bloom-Richardsongrading approach. Bloom-Richardson grades are based, at least in part,on mitotic count. In a different embodiment, a grading approach otherthan Bloom-Richardson grading that uses mitotic count may be employed.In another embodiment, a human pathologist may use the mitotic count tomanually grade a cancer specimen using a Bloom-Richardson approach, oranother approach that employs mitotic count.

Using the features extracted at steps 130 and 140, example methods andapparatus employing a cascaded architecture of CNN and HC featuresprovide faster and more accurate detection of mitotic nuclei thanconventional systems. Improved classification and detection of mitoticnuclei may produce the technical effect of improving treatment efficacyand improving doctor efficiency by increasing the accuracy anddecreasing the time required to grade a cancer specimen. Treatments andresources may be more accurately tailored to patients with moreaggressive cancer so that more appropriate protocols may be employed.Using a more appropriate protocol may lead to less therapeutics beingrequired for a patient or may lead to avoiding or delaying a resectionor other invasive procedure.

In one embodiment, method 100 detects cellular mitosis in a region ofBCa tissue. In other embodiments, other features may be extracted, andcellular mitosis in regions of other diseases may be detected. Forexample, cellular mitosis in regions of prostate, or other types ofcancer may be detected. Improving detection of cellular mitosis incancerous regions also produces the technical effect of improvingtreatment efficacy and improving doctor efficiency. When cellularmitosis is more quickly and more accurately detected, patients most atrisk may receive a higher proportion of scarce resources (e.g.,therapeutics, physician time and attention, hospital beds) while thoseless at risk may be spared unnecessary treatment, which in turn sparesunnecessary expenditures and resource consumption. Example methods andapparatus may also have the concrete effect of improving patientoutcomes.

While FIG. 1 illustrates various actions occurring in serial, it is tobe appreciated that various actions illustrated in FIG. 1 could occursubstantially in parallel. By way of illustration, a first process couldsegment candidate mitosis patches in the image, a second process couldextract HC features, and a third process could extract CNN-learnedfeatures. While three processes are described, it is to be appreciatedthat a greater or lesser number of processes could be employed and thatlightweight processes, regular processes, threads, and other approachescould be employed.

FIG. 2 illustrates an iteration of a computerized method 200 that issimilar to method 100. FIG. 2 illustrates the cascaded nature of method100. Method 200 includes, at 210, accessing an image of a region ofpathological tissue. Accessing the image may include acquiringelectronic data, reading from a computer file, receiving a computerfile, reading from a computer memory, or other computerized activity.Method 200 also includes, at 220, segmenting the image into a candidatemitosis patch. In one embodiment, segmenting an image into a candidatemitosis patch includes generating a blue-ratio image by converting theimage from RGB color space to a blue-ratio color space, computing a LoGresponse for the blue-ratio image, and identifying candidate nuclei byintegrating globally fixed thresholding and local dynamic thresholding.

Method 200 also includes, at 230, extracting a set of CNN-learnedfeatures from the candidate mitosis patch using a CNN. In oneembodiment, the CNN is a three-layer CNN including a firstconvolution-pooling layer, a second convolution-pooling layer, and afully-connected layer. The first convolution-pooling layer includes afirst convolution layer and a first pooling layer. The secondconvolution-pooling layer includes a second convolution layer and asecond pooling layer. The first convolution layer has at least 64 unitsand the first pooling layer has at least 64 units. The secondconvolution layer has at least 128 units and the second pooling layerhas at least 128 units. The fully-connected layer has at least 256units. For each layer, a fixed 8×8 convolution kernel and 2×2 poolingkernel are used. The output of the CNN includes two units that accept asinput the entire output of the fully-connected layer, and indicate thecandidate mitosis patch is mitotic or non-mitotic. In other embodiments,different sized kernels may be used, and the layers may have differentnumbers of units. Method 200 also includes, at 232, training a CNNclassifier using the set of CNN-learned features. Method 200 alsoincludes, at 234, generating a CNN classification score by classifyingthe candidate mitosis patch with the CNN classifier. The CNNclassification score indicates the probability the candidate mitosispatch is mitotic. In one embodiment, the CNN classifier generates alog-likelihood that the candidate mitosis patch is a member of a class(e.g., mitotic, non-mitotic). An exponential function is applied to thelog-likelihood to generate a probability that the candidate mitosispatch is mitotic. If the probability exceeds a threshold probability,the CNN classifier classifies the candidate mitosis patch as mitotic. Ifthe probability is equal to or less than the threshold probability, theCNN classifier classifies the candidate mitosis patch as non-mitotic. Inone embodiment, the threshold probability is 0.58. In other embodiments,other threshold probabilities may be used.

Method 200 also includes, at 240, extracting a set of HC features fromthe candidate mitosis patch. The set of HC features includes a set ofmorphology features, a set of intensity features, or a set of texturefeatures. Method 200 also includes, at 242, training an HC classifierusing the set of HC features. Training the HC classifier may includereducing non-mitotic nuclei by replacing overlapping non-mitotic nucleiwith their clustered center, oversampling mitotic cells by applyingSMOTE, and using an empirically selected threshold of 0.58.

Method 200 also includes, at 244, generating an HC classification scoreby classifying the candidate mitosis patch using the HC classifier. TheHC classification score indicates the probability the candidate mitosispatch is mitotic. In one example, if the probability that the candidatemitosis patch exceeds a threshold probability, the HC classifierclassifies the candidate patch as mitotic. If the probability that thecandidate mitosis patch is mitotic is equal to or less than thethreshold probability, the HC classifier classifies the candidatemitosis patch as non-mitotic. In one embodiment, the thresholdprobability is 0.58. In other embodiments, other threshold probabilitiesmay be employed.

Method 200 also includes, at 250, comparing the HC classification scoreand the CNN classification score. If both the CNN classification scoreand the HC classification score are within a threshold range, thenmethod 200 proceeds to block 260. In one embodiment, upon determiningthat both the CNN classification score and HC classification score areabove the threshold probability, method 200 continues to block 260. Upondetermining that both the CNN classification score and the HCclassification score are equal to or below the threshold probability,method 200 also continues to block 260. Thus, the CNN classificationscore and HC classification score may both be within the threshold range(e.g., in agreement) when both the CNN classification score and the HCclassification score are above a threshold probability, or when both areequal to or less than the threshold probability.

Upon determining that the CNN classification score and the HCclassification score do not agree, method 200 continues to block 252.The CNN classification score and the HC classification score do notagree when one of the CNN classification score or the HC classificationscore is above the threshold probability, and the other, differentclassification score is equal to or below the threshold probability. Forexample, in an embodiment in which the threshold probability is 0.58, aCNN classification score of 0.2 and an HC classification score of 0.9 donot agree. In another embodiment, the CNN classification score mayindicate a binary value of mitotic or non-mitotic, instead of or inaddition to a probability. Similarly, the HC classification score mayalso indicate a binary value of mitotic or non-mitotic. In thisembodiment, if one of the HC classification score or the CNNclassification score indicates mitotic, and the other, differentclassification score indicates non-mitotic, then the classificationscores do not agree, and method 200 proceeds to block 252.

Method 200 includes, at 252, training a cascaded classifier. Thecascaded classifier is trained using a stacked set of features. Thestacked set of features includes the set of HC features and the set ofCNN-learned features. By training the cascaded classifier afterdetecting confounding image patches, rather than in all cases, examplemethods and apparatus reduce the amount of computing resources requiredby conventional methods that employ stacked feature sets. Method 200also includes, at 254, generating a cascaded classification score. Thecascaded classifier generates a cascaded classification score byclassifying the candidate mitosis patch. The cascaded classificationscore indicates the probability that the candidate mitosis patch ismitotic.

Method 200 also includes, at 260, producing a final classification. Ifboth the HC classification score and CNN classification score agree, thefinal classification is based, at least in part, on the CNNclassification score and the HC classification score. If the HCclassification score and the CNN classification score do not agree, thefinal classification is based, at least in part, on a weighted averageof the CNN classification score, the HC classification score, and thecascaded classification score. By only training the cascaded classifierand classifying the candidate mitosis patch using the cascadedclassifier when the CNN classification score and the HC classificationscore do not agree, method 200 improves on conventional methods that usea stacked feature set to classify nuclei. Combining the set of HCfeatures and the set of CNN-learned features into the stacked set offeatures leverages the disconnected feature sets to improveclassification accuracy when the individual HC classifier and CNNclassifiers are confounded.

Method 200 also includes, at 270, controlling an automated mitoticnuclei detection system to classify the candidate mitosis patch asmitotic or non-mitotic, based, at least in part, on the finalclassification. Method 200 also includes, at 280, generating a mitoticcount by computing the total number of candidate mitosis patchesclassified as mitotic by the automated mitotic nuclei detection system.Method 200 also includes, at 290, controlling an automated cancergrading system to grade the image. In one embodiment, the automatedcancer grading system grades the image using a Bloom-Richardson grade.The Bloom-Richardson grade is based, at least in part, on the mitoticcount.

FIG. 3 is a flow chart illustrating the operation of a light CNN modelthat may be employed in example methods and apparatus. In oneembodiment, the CNN described by FIG. 3 includes 3 layers. The CNNincludes a first convolution-pooling layer, a second convolution-poolinglayer, and a fully-connected layer. The first convolution-pooling layerincludes a first convolution layer that has at least 64 units, and afirst pooling layer that has at least 64 units. The secondconvolution-pooling layer has a second convolution layer that has atleast 128 units, and a second pooling layer that has at least 128 units.The fully-connected layer has at least 256 units. In another embodiment,the layers may have different numbers of units.

FIG. 3 includes, at 310, generating a YUV color space image. The YUVcolor space image is generated by converting the image from RGB colorspace to YUV color space. The YUV color space image is normalized to amean of zero and a variance of one. FIG. 3 also includes, at 320,extracting an input feature map from the YUV color space image. Theinput feature map becomes the input for the first layer of the CNN. FIG.3 also includes, at 330, generating a first output feature map. Thefirst output feature map is generated by applying, in the firstconvolution layer of the CNN, a two-dimensional (2D) convolution of theinput feature map and a first convolution kernel. In one embodiment, thefirst convolution kernel is a fixed 8×8 convolutional kernel. The firstoutput feature map becomes the input for the first pooling layer.

FIG. 3 also includes, at 340, generating a first pooled map. The firstpooled map is generated by applying, in the first pooling layer in theCNN, an L2 pooling function over a spatial window. The spatial window isapplied over the first output feature map. The L2 pooling function isapplied without overlapping. An L2 pooling function facilitates optimallearning of invariant features in the spatial window. In one embodiment,a fixed 2×2 pooling kernel is used. The first output feature map becomesthe input for the second convolution layer.

FIG. 3 also includes, at 350, generating a second output feature map.The second output feature map is generated by applying, in the secondconvolution layer of the CNN, a 2D convolution of the first pooled mapand a second convolution kernel. In one embodiment, the secondconvolution kernel is a fixed 8×8 convolution kernel. The second outputfeature map becomes the input for the second pooling layer.

FIG. 3 also includes, at 360, generating a second pooled map in thesecond pooling layer. The second pooled map is generated by applying anL2 pooling function over a spatial window. The spatial window is appliedover the second output feature map. The L2 pooling function is appliedwithout overlapping. In one embodiment, a fixed 2×2 pooling kernel isused. The second pooled map is used as the input for the fully-connectedlayer.

FIG. 3 also includes, at 370, generating a feature vector. The featurevector is generated by applying the second pooled map to thefully-connected layer of the CNN. FIG. 3 also includes, at 380,generating a fully-connected layer output by activating an output unitin the CNN. In one embodiment, the output of the fully-connected layeris two units. A first unit, when activated, indicates mitosis, and asecond, different unit, when activated, indicates non-mitosis.

Example methods and apparatus improve the detection of cellular mitosisin cancerous tissue compared to conventional methods by employing acascaded approach that uses a lighter CNN than those CNNs used byconventional methods. Lighter CNNs reduce the time and resourcesrequired to automatically detect cellular mitosis compared toconventional methods. Example methods and apparatus also leveragedisconnected feature sets that are not exploited by conventionalmethods, which facilitates making more accurate predictions of patientprognosis. Improving patient prognosis prediction facilitates allocatingresources, personnel, and therapeutics to appropriate patients whilesparing patients from treatment that might have been prescribed with aless accurate prediction.

In one example, a method may be implemented as computer executableinstructions. Thus, in one example, a computer-readable storage mediummay store computer executable instructions that if executed by a machine(e.g., computer) cause the machine to perform methods described orclaimed herein including method 100, method 200, method 300, and method400. While executable instructions associated with the listed methodsare described as being stored on a computer-readable storage medium, itis to be appreciated that executable instructions associated with otherexample methods described or claimed herein may also be stored on acomputer-readable storage medium. In different embodiments the examplemethods described herein may be triggered in different ways. In oneembodiment, a method may be triggered manually by a user. In anotherexample, a method may be triggered automatically.

FIG. 4 illustrates an example method 400 for producing a classificationof a region of interest. Method 400 includes, at 410, accessing an imageof the region of interest. In one embodiment, the image is a whole slideimage (WSI) of a region of invasive ductal carcinoma (IDC) tissue. Inanother embodiment, the image is an HPF of an H&E stained slide of aregion of BCa tissue. In other embodiments, images acquired using otherimaging techniques may be accessed.

Method 400 also includes, at 420, generating a first classification forthe image. The first classification may be a probability that the imageis a member of a class. The first classification is generated by a firstclassifier trained using a first feature set. In one embodiment, thefirst classifier is a light CNN that includes three layers. In oneembodiment, the first layer includes a first convolution layer and afirst pooling layer. The second layer includes a second convolutionlayer and a second pooling layer. The third layer includes afully-connected layer. In one embodiment, the first convolution layerand the first pooling layer each have 32 units, the second convolutionlayer and the second pooling layer each have 64 units, and thefully-connected layer has 128 units. In another embodiment, the firstconvolution layer and the first pooling layer each have 64 units, thesecond convolution layer and the second pooling layer each have 128units, and the fully-connected layer has 256 units. In yet otherembodiments, the CNN may include more layers, or the layers may havedifferent numbers of units.

Method 400 also includes, at 430 generating a second classification forthe image. The second classification may be a probability that the imageis a member of a class. The second classification is generated using asecond classifier trained using a second feature set. The second featureset is distinguishable from the first feature set. In one embodiment,the second classifier is a fifty-tree Random Forests classifier and thefirst feature set is a handcrafted feature set. The first feature setmay include a morphology feature subset, an intensity feature subset, ora texture feature subset. In other embodiments, other classifiers may beused, or the first feature set may include other feature subsets.

Method 400 also includes, at 440, determining if the firstclassification and the second classification are both within a thresholdamount. In one embodiment, the first classification and the secondclassification are within a threshold if the first classification andthe second classification both indicate the image is within a firstclass (e.g., mitotic). The first classification and the secondclassification may be within a threshold if the first classification andthe second classification both indicate the image is not in the firstclass, but is in a second class (e.g., non-mitotic). Upon determiningthat the first classification and the second classification are bothwithin a threshold, method 400 produces, at 460 a final classificationbased, at least in part, on the first classification and the secondclassification.

Method 400, upon determining at 440 that the first classification andthe second classification are not both within threshold amount, proceedsto block 450. For example, method 400 may determine that the firstclassification indicates membership in the first class and that thesecond classification indicates membership in the second class. Method400, at 450, generates a third classification using a third classifiertrained using the first feature set and the second feature set. In oneembodiment, the third classifier is trained using a stacked feature setthat includes the first feature set and the second feature set. Thethird classification indicates the probability that the image is amember of a class. The dimensionality of the first feature set or thesecond feature set may be reduced using principal component analysis(PCA) or minimum redundancy maximum relevance (mRMR) feature selection.

Method 400 also includes, at 460, producing a final classification. Thefinal classification indicates the probability that the image is amember of the class. In one embodiment, the final classification isbased on a weighted average of the first classification, the secondclassification, and the third classification. Method 400 also includes,at 470, controlling an automated image classification system to classifythe image based, at least in part, on the final classification. In oneembodiment, the automated image classification system is an automatedmitosis detection system used in grading breast cancer.

FIG. 5 illustrates an example apparatus 500 that detects mitosis incancer pathology images. Apparatus 500 includes a processor 510, amemory 520, an input/output interface 530, a set of logics 540, and aninterface 550 that connects the processor 510, the memory 520, theinput/output interface 530, and the set of logics 540. The set of logics540 includes an image acquisition logic 541, a CNN logic 543, an HCfeatures logic 544, a first classification logic 545, a cascadedarchitecture logic 547, and a second classification logic 549.

Image acquisition logic 541 acquires an image of a region of tissue. Theregion of tissue may be a section of diseased tissue. The image may be ahigh-power field (HPF) representing a 512 μm by 512 μm area of theregion of tissue. In one embodiment, image acquisition logic 541acquires a digitally scanned H&E stain image magnified at 400×. Thedigitally scanned H&E stain image may be provided by an Aperio XTscanner. The Aperio XT scanner has a resolution of 0.2456 μm per pixeland generates a 2084 pixel by 2084 pixel RGB image of the HPF. Inanother embodiment, images that are made using other scanners, otherstaining techniques, or different magnification levels may be acquired.For example, the image may be provided by an optical microscope, or anautomated slide staining system. Thus, accessing the image may includeinteracting with a scanning apparatus, an optical microscope, or anautomated slide staining system. Other imaging systems may be used togenerate and access the image accessed by image acquisition logic 541.Image acquisition logic 541 may segment the image into patches. A patchis smaller than the image.

CNN logic 543 extracts a set of CNN-learned features from a patch of theimage. CNN logic 543 generates a first probability that the patch ismitotic, based, at least in part, on the set of CNN-learned features. Inone embodiment, CNN logic 543 uses a three layer CNN. The three-layerCNN includes a first convolutional and pooling layer, a secondconvolution and pooling layer, and a fully-connected layer. Thefully-connected layer produces a feature vector. The fully-connectedlayer outputs a first unit and a second unit that are activated by alogistic regression model, based, at least in part, on the featurevector. The first unit indicates mitosis when activated, and the secondunit indicates non-mitosis when activated.

HC features logic 544 extracts a set of HC features from the patch. HCfeatures logic 544 generates a second probability that the patch ismitotic. The second probability is based, at least in part, one the setof HC features. In one embodiment, the set of HC features includes asubset of morphology features, a subset of intensity features, and asubset of texture features.

First classification logic 545 classifies the patch as mitotic when thefirst probability and the second probability both exceed a thresholdprobability. First logic 545 classifies the patch as non-mitotic whenboth the first probability and the second probability do not exceed thethreshold probability. First logic 545 classifies the patch asconfounding when one of the first probability or the second probabilityexceeds the threshold probability, and the other, different probabilitydoes not exceed the threshold probability.

In one embodiment, when the first classification logic has classifiedthe patch as confounding, cascaded architecture logic 547 generates athird probability that the patch is mitotic. The third classification isbased, at least in part, on a stacked set of features. The stacked setof features includes the set of CNN-learned features and the set of HCfeatures.

Second classification 549 generates a fourth probability that the patchis mitotic. The fourth probability is based, at least in part, on aweighted average of the first probability, the second probability, andthe third probability. In one embodiment, second classification logic549 classifies the patch as mitotic when the weighted average is greaterthan the threshold and classifies the patch as non-mitotic when theweighted average is less than or equal to the threshold probability. Inone embodiment, the threshold probability is 0.58. In anotherembodiment, second classification logic 549 may control a computer aideddiagnosis (CADx) system to classify the image. For example, secondclassification logic 549 may control a computer aided breast cancerdiagnostic system to grade the image based, at least in part, on thefourth probability. In other embodiments, other types of CADx systemsmay be controlled, including CADx systems for grading colon cancer, lungcancer, bone metastases, prostate cancer, and other diseases that aregraded, at least in part, using mitotic count. Second classificationlogic 549 may control the CADx system to display the grade on a computermonitor, a smartphone display, a tablet display, or other displays.Displaying the grade may also include printing the grade. Secondclassification logic 549 may also control the CADx to display an imageof the detected mitotic nuclei.

FIG. 6 illustrates an example computer 600 in which example methodsillustrated herein can operate and in which example logics may beimplemented. In different examples computer 600 may be part of a digitalwhole slide scanner, may be operably connectable to a digital wholeslide scanner, may be part of a microscope, may be operably connected toa microscope, or may be part of a CADx system.

Computer 600 includes a processor 602, a memory 604, and input/outputports 610 operably connected by a bus 608. In one example, computer 600may include a set of logics 630 that perform a method of detectingcellular mitosis in a region of cancerous tissue using a cascadedapproach to intelligently combining handcrafted features withconvolutional neural networks. Thus, the set of logics 630, whetherimplemented in computer 600 as hardware, firmware, software, and/or acombination thereof may provide means (e.g., hardware, software) fordetecting cellular mitosis in a region of cancerous tissue using HCfeatures and CNN-learned features. In different examples, the set oflogics 630 may be permanently and/or removably attached to computer 600.

Processor 602 can be a variety of various processors including dualmicroprocessor and other multi-processor architectures. Memory 604 caninclude volatile memory and/or non-volatile memory. A disk 606 may beoperably connected to computer 600 via, for example, an input/outputinterface (e.g., card, device) 618 and an input/output port 610. Disk606 may include, but is not limited to, devices like a magnetic diskdrive, a tape drive, a Zip drive, a flash memory card, or a memorystick. Furthermore, disk 606 may include optical drives like a CD-ROM ora digital video ROM drive (DVD ROM). Memory 604 can store processes 614or data 616, for example. Disk 606 or memory 604 can store an operatingsystem that controls and allocates resources of computer 600.

Bus 608 can be a single internal bus interconnect architecture or otherbus or mesh architectures. While a single bus is illustrated, it is tobe appreciated that computer 600 may communicate with various devices,logics, and peripherals using other busses that are not illustrated(e.g., PCIE, SATA, Infiniband, 1394, USB, Ethernet).

Computer 600 may interact with input/output devices via I/O interfaces618 and input/output ports 610. Input/output devices can include, butare not limited to, digital whole slide scanners, an optical microscope,a keyboard, a microphone, a pointing and selection device, cameras,video cards, displays, disk 606, network devices 620, or other devices.Input/output ports 610 can include but are not limited to, serial ports,parallel ports, or USB ports.

Computer 600 may operate in a network environment and thus may beconnected to network devices 620 via I/O interfaces 618 or I/O ports610. Through the network devices 620, computer 600 may interact with anetwork. Through the network, computer 600 may be logically connected toremote computers. The networks with which computer 600 may interactinclude, but are not limited to, a local area network (LAN), a wide areanetwork (WAN), or other networks.

References to “one embodiment”, “an embodiment”, “one example”, and “anexample” indicate that the embodiment(s) or example(s) so described mayinclude a particular feature, structure, characteristic, property,element, or limitation, but that not every embodiment or examplenecessarily includes that particular feature, structure, characteristic,property, element or limitation. Furthermore, repeated use of the phrase“in one embodiment” does not necessarily refer to the same embodiment,though it may.

“Computer-readable storage medium”, as used herein, refers to a mediumthat stores instructions or data. “Computer-readable storage medium”does not refer to propagated signals. A computer-readable storage mediummay take forms, including, but not limited to, non-volatile media, andvolatile media. Non-volatile media may include, for example, opticaldisks, magnetic disks, tapes, and other media. Volatile media mayinclude, for example, semiconductor memories, dynamic memory, and othermedia. Common forms of a computer-readable storage medium may include,but are not limited to, a floppy disk, a flexible disk, a hard disk, amagnetic tape, other magnetic medium, an application specific integratedcircuit (ASIC), a compact disk (CD), other optical medium, a randomaccess memory (RAM), a read only memory (ROM), a memory chip or card, amemory stick, and other media from which a computer, a processor orother electronic device can read.

“Logic”, as used herein, includes but is not limited to hardware,firmware, software in execution on a machine, or combinations of each toperform a function(s) or an action(s), or to cause a function or actionfrom another logic, method, or system. Logic may include a softwarecontrolled microprocessor, a discrete logic (e.g., ASIC), an analogcircuit, a digital circuit, a programmed logic device, a memory devicecontaining instructions, and other physical devices. Logic may includeone or more gates, combinations of gates, or other circuit components.Where multiple logical logics are described, it may be possible toincorporate the multiple logical logics into one physical logic.Similarly, where a single logical logic is described, it may be possibleto distribute that single logical logic between multiple physicallogics.

To the extent that the term “includes” or “including” is employed in thedetailed description or the claims, it is intended to be inclusive in amanner similar to the term “comprising” as that term is interpreted whenemployed as a transitional word in a claim.

Throughout this specification and the claims that follow, unless thecontext requires otherwise, the words ‘comprise’ and ‘include’ andvariations such as ‘comprising’ and ‘including’ will be understood to beterms of inclusion and not exclusion. For example, when such terms areused to refer to a stated integer or group of integers, such terms donot imply the exclusion of any other integer or group of integers.

To the extent that the term “or” is employed in the detailed descriptionor claims (e.g., A or B) it is intended to mean “A or B or both”. Whenthe applicants intend to indicate “only A or B but not both” then theterm “only A or B but not both” will be employed. Thus, use of the term“or” herein is the inclusive, and not the exclusive use. See, Bryan A.Garner, A Dictionary of Modern Legal Usage 624 (2d. Ed. 1995).

While example systems, methods, and other embodiments have beenillustrated by describing examples, and while the examples have beendescribed in considerable detail, it is not the intention of theapplicants to restrict or in any way limit the scope of the appendedclaims to such detail. It is, of course, not possible to describe everyconceivable combination of components or methodologies for purposes ofdescribing the systems, methods, and other embodiments described herein.Therefore, the invention is not limited to the specific details, therepresentative apparatus, and illustrative examples shown and described.Thus, this application is intended to embrace alterations,modifications, and variations that fall within the scope of the appendedclaims.

What is claimed is:
 1. A non-transitory computer-readable storage mediumstoring computer-executable instructions that when executed by acomputer control the computer to perform a method for detecting cellularmitosis in a region of cancerous tissue, the method comprising:acquiring an image of cancerous tissue; segmenting the image into acandidate mitosis patch; extracting a set of convolutional neuralnetwork (CNN) learned features from the candidate mitosis patch using aCNN; training a CNN classifier using the set of CNN-learned features;generating a CNN classification score by classifying the candidatemitosis patch with the CNN classifier; extracting a set of hand-crafted(HC) features from the candidate mitosis patch; training an HCclassifier using the set of HC features; generating an HC classificationscore by classifying the candidate mitosis patch with the HC classifier;producing a final classification based, at least in part, on both theCNN classification score and the HC classification score; controlling anautomated mitotic nuclei detection system to classify the candidatemitosis patch as mitotic or non-mitotic based on the finalclassification; generating a mitotic count by summing the number ofcandidate mitosis patches classified as mitotic by the automated mitoticnuclei detection system; and controlling an automated cancer gradingsystem to grade the image using a Bloom-Richardson grade, where theBloom-Richardson grade is based, at least in part, on the mitotic count.2. The non-transitory computer-readable storage medium of claim 1, whereproducing the final classification comprises: comparing the CNNclassification score to the HC classification score, and upondetermining that the CNN classification score and the HC classificationscore are within a threshold range: producing a final classification,based, at least in part, on the CNN classification score and the HCclassification score, where the final classification indicates theprobability that the mitosis patch is mitotic.
 3. The non-transitorycomputer-readable storage medium of claim 1, where producing the finalclassification comprises: comparing the CNN classification score to theHC classification score, and upon determining that the CNNclassification score and the HC classification score are not within athreshold range: training a cascaded classifier using a stacked set offeatures, where the stacked set of features comprises the set ofCNN-learned features and the set of HC features; generating a cascadedclassification score by classifying the candidate mitosis patch with thecascaded classifier, and producing a final classification, based, atleast in part, on a weighted average of the CNN classification score,the HC classification score, and the cascaded classification score,where the final classification indicates the probability that themitosis patch is mitotic.
 4. The non-transitory computer-readablestorage medium of claim 1, where acquiring the image comprises scanningan image from a high power field (HPF) of a hematoxylin and eosin (H&E)stained tissue slide, where the HPF represents at least a 512×512 μmregion of tissue, where the HPF is acquired using a slide scanner and amulti-spectral microscope, where the image is a RGB color space image,and where the image has dimensions of at least 2084 pixels×2084 pixels.5. The non-transitory computer-readable storage medium of claim 4, wheresegmenting the image into a candidate mitosis patch comprises:generating a blue-ratio image by converting the image from RBG colorspace to a blue-ratio color space; computing a Laplacian of Gaussian(LoG) response for the blue-ratio image, and identifying a candidatenuclei by integrating globally fixed thresholding and local dynamicthresholding.
 6. The non-transitory computer-readable storage medium ofclaim 5, where the CNN comprises: a first convolutional layer that hasat least P units, P being a number; a first pooling layer connected tothe first convolutional layer, where the first pooling layer has atleast Q units, Q being a number; a second convolutional layer connectedto the first pooling layer, where the second convolutional layer has atleast X units, X being a number greater than P and greater than Q; asecond pooling layer connected to the second convolutional layer, wherethe second pooling layer has Y units, Y being a number greater than Pand greater than Q; a fully-connected layer connected to the secondpooling layer, where the fully-connected layer has Z units, Z being anumber greater than Y and greater than X, and an output layer connectedto the fully-connected layer, where the output layer has at least twooutput units.
 7. The non-transitory computer-readable storage medium ofclaim 6, where P is at least 64, where Q is at least 64, where X is atleast 128, where Y is at least 128, and where Z is at least
 256. 8. Thenon-transitory computer-readable storage medium of claim 7, whereextracting the set of CNN learned features from the candidate mitosispatch comprises: generating a YUV color space image by converting theimage from RGB color space to YUV color space and by normalizing the YUVcolor space image to a mean of zero and a variance of one; extracting aninput feature map from the YUV color space image; generating a firstoutput feature map by applying, in the first convolution layer of theCNN, a two dimensional (2D) convolution of the input feature map and afirst convolution kernel; generating a first pooled map by applying, inthe first pooling layer in the CNN, an L2 pooling function over aspatial window applied over the first output feature map, where the L2pooling function is applied without overlapping; generating a secondoutput feature map by applying, in the second convolution layer of theCNN, a 2D convolution of the first pooled map and a second convolutionkernel; generating a second pooled map by applying, in the secondpooling layer of the CNN, an L2 pooling function over a spatial windowapplied over the second output feature map, where the L2 poolingfunction is applied without overlapping; generating a feature vector byapplying the second pooled map to the fully-connected layer of the CNN,and generating a fully-connected layer output by activating an outputunit in the CNN based, at least in part, on a logistic regression modeland the feature vector, where the output unit is one of a mitosis unitor a non-mitosis unit.
 9. The non-transitory computer-readable storagemedium of claim 8, where training the CNN classifier using the set ofCNN-learned features comprises: generating a rotated mitosis patch byrotating the candidate mitosis patch; generating a mirrored mitosispatch by mirroring the candidate mitosis patch, and computing alog-likelihood that the candidate mitosis patch is mitotic, wherecomputing the log-likelihood comprises minimizing a loss function bytraining the CNN classifier using the rotated mitosis patch, themirrored mitosis patch, and the candidate mitosis patch using aStochastic Gradient Descent, where the loss function is described by:${{L(x)} = {- {\log \left\lbrack \frac{^{x_{i}}}{\sum_{j}^{x_{j}}} \right\rbrack}}},$where x_(i) corresponds to the fully-connected layer output multipliedby a logistic model parameter.
 10. The non-transitory computer-readablestorage medium of claim 9, where generating the CNN classification scoreby classifying the candidate mitosis patch with the CNN classifiercomprises: computing the probability that the candidate mitosis patch ismitotic by applying an exponential function to the log-likelihood thatthe candidate mitosis patch is mitotic, where the probability that thecandidate mitosis patch is mitotic is a real number between 0 and 1;assigning the probability that the candidate mitosis patch is mitotic tothe CNN classification score; upon determining that the probability thatthe candidate mitosis patch is mitotic is greater than a CNN thresholdprobability, where the CNN threshold probability is at least 0.58,classifying the CNN classification score as mitotic, and upondetermining that the probability that the candidate mitosis patch ismitotic is less than or equal to the CNN threshold probability,classifying the CNN classification score as non-mitotic.
 11. Thenon-transitory computer-readable storage medium of claim 1, whereextracting the set of HC features from the candidate mitosis patchcomprises: extracting a set of morphology features from the candidatemitosis patch; extracting a set of intensity features from the candidatemitosis patch, and extracting a set of texture features from thecandidate mitosis patch.
 12. The non-transitory computer-readablestorage medium of claim 11 where the set of morphology features includesarea, eccentricity, equivalent diameter, Euler number, extent,perimeter, solidity, major axis length, minor axis length, area overlapratio, average radial ratio, compactness, Hausdorff dimension,smoothness, or standard distance ratio.
 13. The non-transitorycomputer-readable storage medium of claim 12, the method comprisinggenerating a binary mask by applying blue-ratio thresholding and localnon-maximum suppression to the candidate mitosis patch and extractingthe set of morphology features from the binary mask of the candidatemitosis patch.
 14. The non-transitory computer-readable storage mediumof claim 13, the method comprising extracting the set of intensityfeatures from a set of channels of the candidate mitosis patch, the setof channels including a blue-ratio channel, a red channel, a bluechannel, a green channel, an L channel in LAB color space, a V channelin CIE 1976 (L*, u*, v*) (LUV) color space, or an L channel in LUV colorspace, and where the set of intensity features includes mean intensity,median intensity, variance, maximum/minimum ratio, range, interquartilerange, kurtosis, or skewness.
 15. The non-transitory computer-readablestorage medium of claim 14, the method comprising extracting the set oftexture features from a set of channels of the candidate mitosis patch,the set of channels including a blue-ratio channel, a red channel, ablue channel, a green channel, an L channel in LAB color space, a Vchannel in CIE 1976 (L*, u*, v*) (LUV) color space, or an L channel inLUV color space, where the set of texture features includes a subset ofconcurrence features and a subset of run-length features, where thesubset of concurrence features includes the mean and standard deviationof 13 Haralick gray-level concurrence features obtained from thecandidate mitosis patch at four orientations, and where the subset ofrun-length features includes the mean and standard deviation of a set ofgray-level run-length matrices, where the set of gray-level run-lengthmatrices correspond to four orientations.
 16. The non-transitorycomputer-readable storage medium of claim 15, the method comprising:reducing the dimensionality of the set of HC features, where thedimensionality of the set of HC features is reduced using principalcomponent analysis (PCA) or minimum redundancy maximum relevance (mRMR)feature selection, and selecting an optimum set of HC features.
 17. Thenon-transitory computer-readable storage medium of claim 16, where theoptimum set of HC features comprises 98% of the component variations orthe top 160 features selected using mRMR feature selection.
 18. Thenon-transitory computer-readable storage medium of claim 16, wheretraining the HC classifier using the set of HC features comprises;reducing the number of non-mitotic nuclei by detecting a clusteredcenter of overlapping nuclei; replacing the overlapping non-mitoticnuclei with the clustered center, and oversampling mitotic nuclei byapplying a Synthetic Minority Oversampling Technique (SMOTE).
 19. Thenon-transitory computer-readable storage medium of claim 18, wheregenerating the HC classification score comprises; generating an outputof a Random Forest classifier, where the Random Forest classifier has atleast 50 trees, where the output is a probability that the candidatemitosis patch is mitotic; assigning the output of the Random Forestclassifier to the HC classification score; upon determining that theoutput of the Random Forest classifier is greater than an HC thresholdprobability, where the HC threshold probability is at least 0.58,classifying the HC classification score as mitotic, and upon determiningthat the output of the Random Forest classifier is less than or equal tothe HC threshold probability, classifying the HC classification score asnon-mitotic.
 20. A non-transitory computer-readable storage mediumstoring computer-executable instructions that when executed by acomputer control the computer to perform a method for producing aclassification of a region of interest, the method comprising: accessingan image of the region of interest; generating a first classificationfor the image using a first classifier trained using a first featureset, where the first classifier is a light convolutional neural network(CNN); generating a second classification for the image using a secondclassifier trained using a second feature set, where the second featureset is distinguishable from the first feature set, and where the secondfeature set is a hand-crafted (HC) feature set; upon determining thatthe first classification and the second classification are within athreshold amount, producing a final classification for the image based,at least in part, on the first classification and the secondclassification; upon determining that the first classification and thesecond classification are not within the threshold amount, generating athird classification for the image using a third classifier trainedusing the first feature set and the second feature set, and producingthe final classification for the image based, at least in part, on aweighted average of the first classification, the second classification,and the third classification, and controlling an automated imageclassification system to classify the image based on the finalclassification.
 21. An apparatus for detecting mitosis in cancerpathology images, comprising: a processor; a memory; an input/outputinterface; a set of logics; and an interface to connect the processor,the memory, the input/output interface and the set of logics, the set oflogics comprising: an image acquisition logic that acquires an image ofa region of tissue; a convolutional neural network (CNN) logic thatextracts a set of CNN-learned features from a patch of the image andgenerates a first probability that the patch is mitotic, based, at leastin part, on the set of CNN-learned features, where the patch of theimage is smaller than the image; a hand-crafted (HC) features logic thatextracts a set of HC features from the patch and generates a secondprobability that the patch is mitotic, based, at least in part, on theset of HC features; a first classification logic that classifies thepatch as mitotic when the first probability and the second probabilityare both greater than a threshold probability, classifies the patch asnon-mitotic when the first probability and the second probability areboth less than or equal to the threshold probability, and classifies thepatch as confounding when one of the first probability or the secondprobability is greater than the threshold probability and the other,different probability is less than or equal to the thresholdprobability; a cascaded architecture logic that generates a thirdprobability that the patch is mitotic when the first classificationlogic classified the patch as confounding, and generates the thirdprobability that the patch is mitotic based, at least in part, on astacked set of features, where the stacked set of features includes theset of CNN-learned features and the set of HC features, and a secondclassification logic that generates a fourth probability that the patchis mitotic based, at least in part, on a weighted average of the firstprobability, the second probability, and the third probability, wherethe second classification logic classifies the patch as mitotic when theweighted average is greater than the threshold probability, andclassifies the patch as non-mitotic when the weighted average is lessthan or equal to the threshold probability.